For this project, I created both songs using StableAudio.com, after experimenting with several AI music generators. While I tested nearly every option on the Fairly Trained site, StableAudio.com ultimately proved to be the most effective for my needs. To compose the tracks, I used a variety of keywords aimed at crafting a chill guitar vibe. Both songs are instrumental, as I find AI-generated vocals too robotic, and I also can’t sing myself, so I chose to leave the vocals out. For Song 1, I used keywords such as Post-Rock, Euphoric, and Sentimental (125 BPM), while Song 2 features a 3/4 beat with Bright, Happy, and Surf vibes.
As someone without a background in musicology or AI, my approach to this project may not be technically perfect, and my formatting could have been improved. I would have liked to spend more time refining the layout to ensure that all graphs and visual elements are clearly displayed. Nevertheless, the title of my portfolio centers around the dynamics of AI-generated tracks, and I aimed to explore this through various graphical analyses we learned about in the lectures.
To begin, I analyze the submissions from all the students, including both AI-generated and non-AI tracks, visualizing the distribution using a scatterplot that focuses on arousal and tempo. These aspects seemed the most interesting to me, especially as I sought to investigate the dynamic qualities of the tracks. I felt that arousal, tempo, and danceability were key factors to study in this context. Following this, I examine chromagrams to assess the note-to-frequency relationship within the tracks, followed by chordograms to explore the time-to-chord dynamics. Next, I analyze the self-similarity matrix, where the tracks are compared with themselves in terms of chroma and timbre. Afterward, I look into cepstrograms to study changes in the tracks, focusing on energy novelty and spectral novelty. Additionally, I explore tempograms, which track tempo changes in beats per minute (BPM) over time. There’s also a heatmap that clusters all the tracks in the corpus, offering a comprehensive overview. Finally, a conclusion is drawn regarding the overall dynamics of the AI-generated tracks.
The scatterplot reveals an interesting distribution of tempo and arousal across different tracks. The average tempo is around 100 BPM, with an average arousal level of 5. Most tracks cluster around this point, suggesting a balance between energy and pace. However, there are a few notable outliers: Some tracks exhibit a very low tempo (30 BPM) but still maintain a moderate arousal level of 3.5. This could indicate slow but emotionally engaging pieces.
On the other end of the spectrum, there are tracks with a higher-than-average tempo, yet with a relatively low arousal of 3. These might be fast-paced but more controlled or less intense in emotional expression. Interestingly, the track with the highest arousal (7) did not stand out in terms of speed—it maintained an average tempo, suggesting that factors beyond just BPM contribute to a track’s perceived energy.
Looking at my own compositions: Track 1 has a tempo of 86 BPM and an arousal of 4.8, both slightly below the dataset’s average. Track 2 is quite similar, with a tempo of 83 BPM and an arousal of 4.9. Compared to the rest of the dataset, most tracks tend to have both a higher tempo and a higher arousal than mine, positioning my tracks in a slightly more controlled range. Overall, the scatterplot provides a clear visual of how different tracks balance tempo and energy, highlighting variations in musical intensity and pacing.
The scatterplot with the addition of danceability paints a similar picture to the first, with some interesting new trends. Overall, the relationship between arousal and danceability is quite clear: Most tracks with higher arousal also score high on danceability, indicating that energetic tracks tend to be more danceable. Similarly, tracks with lower arousal tend to have lower danceability, suggesting that less intense music may be less suitable for dancing or more ambient in nature.
One particularly interesting outlier stands out: A track with an arousal of around 6, a tempo of around 110 BPM, but with very low danceability (less than 0.50). This suggests that while the track has a moderately high energy level and tempo, it might lack the rhythmic or structural elements needed to be considered highly danceable, perhaps due to complex rhythms or other features that make it less suited for dancing.
As for my own tracks: Track 1 has an extremely low danceability score of 0.17, indicating that it’s likely not very rhythmically driven or groove-oriented. Track 2 has a higher danceability at 0.29, though still relatively low compared to others. Interestingly, I personally find Track 2 to be much more monotonous and less engaging than Track 1, yet the danceability score does not fully align with my perception of the tracks. This might suggest that the danceability metric isn’t always in line with subjective experiences of musical engagement—it could be influenced by factors like rhythm complexity or musical structure that I find less compelling, but which still contribute to a higher score.
This graph is vibrant, indicating a high level of tonal diversity in the track. The note G appears only in the middle of the song, while F and D# are scarcely present. At the beginning, E, F#, A, and B dominate, suggesting that the song is likely in E major. The pattern is clear and consistent, giving the song a somewhat monotonous feel, which aligns with the listening experience.
The chromagram of this track reveals a strong presence of the notes E, F#, A, and B, clearly indicating that the track is primarily in E major. The pattern across the chromagram is clear and consistent, with these notes dominating throughout, which reinforces the harmonic stability of the piece.
The repetitive nature of the chromagram suggests that the track follows a relatively monotonous structure, as the same notes persist without much deviation. This consistent pattern aligns with the overall feel of the music—there are no significant changes in tonality or harmony, which contributes to a sense of predictability and uniformity.
When listening to the track, this eentonigheid (monotony) is immediately noticeable, and the chromagram perfectly reflects this characteristic. The lack of harmonic variation creates a stable but somewhat predictable atmosphere, where the tension and release typical of more complex harmonic progressions are notably absent.
This chromagram reveals a rich harmonic landscape. Recurring patterns suggest a steady chord progression, while the dominance of E and B hints at a key centered around E major or E minor. The presence of multiple active notes indicates complex harmonies. Moments of lower activity suggest brief rests, creating contrast. Towards the end, increased intensity in higher pitches signals a climax or modulation, building momentum before the resolution. Through this visual representation, we glimpse not just the notes but the energy and movement that shape the music’s emotional flow.
The chordogram of track 1 offers insight into the harmonic structure of the track, allowing us to verify if E major truly dominates. Interestingly, E major appears as one of the darkest regions, indicating that it is actually one of the least prevalent chords. This could suggest that the track frequently shifts between different keys, creating a dynamic and ever-changing harmonic landscape. Another possible explanation is that the presence of background noise—possibly introduced by the AI—might obscure the clarity of the chord progressions, making it harder to discern the dominant tonality. Regardless, the chordogram highlights the complexity and fluidity of the harmonic transitions within the track.
In Track 2, once again, E major is notably absent, which is surprising, especially considering that the chromagrams indicated it as the dominant chord. Instead, almost every possible chord appears throughout the track, creating a vibrant and harmonically rich landscape. This could suggest that the track features a variety of instruments playing simultaneously, contributing to a sense of fullness. However, this dense layering of sounds may also result in background noise and disharmony, which is reflected in the chordogram. When listening to the track, this becomes even more apparent, as the presence of excessive noise makes it less pleasant and harder to follow, resulting in a less enjoyable listening experience.
The chroma-based self-similarity matrix of Track 1 reveals a fascinating and evolving harmonic structure. In the beginning, a diagonal pattern emerges, indicating a repetitive chord progression. This suggests that the track starts with a steady and predictable harmonic foundation, where the chords follow a familiar, consistent pattern. As the track progresses, the pattern shifts to a checkerboard arrangement, signaling a transition to alternating repetition. While still built on repetition, this new structure introduces variation within the repeated sections, adding a dynamic and evolving feel to the music. This shift reflects a move from simplicity to complexity, creating a sense of progression while maintaining the underlying structural familiarity.
The chroma-based self-similarity matrix of Track 2 reveals a clear checkerboard pattern, indicating that the track features repeated notes, but with a twist. Rather than straightforward repetition, the matrix shows an alternating repetition of notes, where the musical elements shift and evolve while maintaining their core structure. This pattern suggests a dynamic balance between familiarity and variation, creating a sense of continuity while introducing enough change to keep the music engaging and unpredictable.
The Timbre-based Self-Similarity Matrix of Track 1 reveals some intriguing contrasts in timbral richness throughout the track. In the intro and especially the outro, there is a noticeable increase in timbre, indicating that these sections of the track likely feature a broader range of instruments. This enhanced timbral complexity in the beginning and end contrasts with the more homogeneous middle section, which is notably more monotonous. The overall timbre here supports the idea that the track is eentonig (monotonous) in the middle, aligning with the pattern seen in the chroma-based self-similarity matrix. This suggests that the middle portion of the track relies on fewer instrumental variations, while the intro and outro offer a more dynamic and layered sound.
In Track 2, the Timbre-based Self-Similarity Matrix shows a slight increase in timbral complexity during the intro and outro, similar to Track 1, but this peak is far less pronounced. The track overall features minimal timbral variation, with much of the piece exhibiting a lighter, more sparse texture. This is in contrast to Track 1, where the timbre is more dynamic in the intro and outro. Interestingly, the lighter timbre of Track 2 could suggest the use of fewer instruments or a simpler arrangement, leading to a less dense overall sound compared to Track 1. However, it could also indicate a different approach to instrument layering and arrangement, resulting in a more subtle variation in timbre throughout the track.
There isn’t much to observe in this cepstogram, but you can notice a difference in timbre between the beginning and the end of the track. The song is monotonous, resulting in few relative variations in timbre throughout.
The cepstogram of track 2 closely resembles that of track 1, with the difference in timbre being even less pronounced here. This could also be due to the song being AI-generated, resulting in fewer dynamic musical structures. To zoom in on the cepstogram, we use the energy-based and spectral-based novelty functions.
The energy-based novelty function for Track 1 shows a peak around 10 seconds, which may suggest a rise in instrumental activity at that point in the track. After this initial peak, the track shows very few significant changes in energy, indicating a lack of energetic variation throughout the rest of the piece. Additionally, since the function uses a small window, the peaks aren’t particularly pronounced, further emphasizing the relatively stable energy levels within the track.
In contrast, the energy-based novelty function for Track 2 displays more peaks compared to Track 1, although these peaks are much smaller due to the smaller window used. This results in fewer significant peaks overall compared to Track 1. The largest peak occurs around 90 seconds, which suggests a notable change in energy at that point. However, due to the smaller window, the energy fluctuations are less pronounced and more fragmented.
The spectral-based novelty function for Track 1 reveals changes in frequency, indicating shifts in pitches and melodies throughout the track. Unlike other analyses that suggested a more monotonous structure, this function doesn’t appear as consistent, possibly due to the small window used, which makes the frequency changes seem more pronounced than they actually are. This suggests that while the track might seem to have more spectral variation, it could just be a result of the narrow window, amplifying smaller shifts in pitch and melody that would otherwise go unnoticed.
Similarly, the spectral-based novelty function for Track 2 follows a pattern similar to Track 1, appearing monotonous overall. At the beginning, there are noticeable peaks, but around 120 seconds, the function shows lower valleys, indicating a drop in spectral variation. As with Track 1, the small window leads to less significant spectral changes, so while it may appear there are variations, these fluctuations are actually quite subtle and don’t reflect major shifts in the overall spectral content.
The tempogram for Track 1 shows a consistent pattern, which could be a result of the AI-generated nature of the track. There is a slight change around 80 seconds, but overall, the tempo remains quite steady. Interestingly, the consistency in tempo is more typical of genres like EDM, House, and Dance, which usually feature a regular beat throughout. However, this track doesn’t align with those genres, making it somewhat unusual and unexpected to see such a consistent tempo pattern here.
The tempogram for Track 2 is even more consistent, with very little change throughout the track, suggesting an overall monotonous tempo. There are few accents or variations, making the track feel quite steady and predictable in terms of rhythm. This lack of tempo changes further emphasizes the uniformity of the track, reinforcing a sense of stability, though it might also contribute to a feeling of sameness.
Truth
Prediction AI Non-AI
AI 35 18
Non-AI 14 23
# A tibble: 2 Ă— 3
class precision recall
<fct> <dbl> <dbl>
1 AI 0.7 0.714
2 Non-AI 0.65 0.634
From the various analyses, it’s clear that Track 1 and Track 2 exhibit distinct differences in dynamics and musical complexity. Track 1 displays a relatively dynamic and evolving structure. The chroma-based self-similarity matrix reveals a steady chord progression at the start, with alternating repetition towards the middle, indicating some level of variation. The energy-based novelty function shows a peak early on, followed by minimal changes, suggesting a burst of activity at the beginning and stability later. Furthermore, the spectral-based novelty function indicates frequent changes in pitch and melody, especially in the intro and outro, hinting at a more instrumentally rich arrangement. However, the tempogram presents a consistent pattern, likely influenced by the AI generation, which creates a steady rhythm more common in genres like EDM and House.
In contrast, Track 2 presents a more monotonous and static dynamic throughout. The chroma-based self-similarity matrix shows a highly repetitive harmonic structure without much tonal variation. The energy-based novelty function reveals some small peaks and a noticeable burst around 90 seconds, but overall, the track remains fairly stable. The spectral-based novelty function indicates minimal spectral variation, with small fluctuations, contributing to a more uniform texture. The tempogram for Track 2 is even more consistent than Track 1, with very few changes in tempo, reinforcing its monotony.
Overall, while Track 1 features some dynamic shifts and contrasts in both timbre and tempo, Track 2 remains far more uniform and stable throughout. The overall dynamics of Track 1 suggest more musical complexity and variation, whereas Track 2 presents a more steady and predictable experience, with fewer changes in energy, pitch, and rhythm.